library(DESeq2)
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated,
eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match,
mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames,
sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max, which.min
Attaching package: ‘S4Vectors’
The following objects are masked from ‘package:base’:
expand.grid, I, unname
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Loading required package: matrixStats
Attaching package: ‘MatrixGenerics’
The following objects are masked from ‘package:matrixStats’:
colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs, colCummins,
colCumprods, colCumsums, colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs, colMads,
colMaxs, colMeans2, colMedians, colMins, colOrderStats, colProds, colQuantiles, colRanges,
colRanks, colSdDiffs, colSds, colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
colWeightedMeans, colWeightedMedians, colWeightedSds, colWeightedVars, rowAlls, rowAnyNAs,
rowAnys, rowAvgsPerColSet, rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps, rowMadDiffs, rowMads, rowMaxs,
rowMeans2, rowMedians, rowMins, rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, rowWeightedMads,
rowWeightedMeans, rowWeightedMedians, rowWeightedSds, rowWeightedVars
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Attaching package: ‘Biobase’
The following object is masked from ‘package:MatrixGenerics’:
rowMedians
The following objects are masked from ‘package:matrixStats’:
anyMissing, rowMedians
library(magrittr)
library(EnhancedVolcano)
Loading required package: ggplot2
Loading required package: ggrepel
dds4v5 = readRDS("outputs/dds4v5.rds")
dds4vM = readRDS("outputs/dds4vM.rds")
dds5vM = readRDS("outputs/dds5vM.rds")
Here we run DESeq2 to get our DEGs for our contrasts of choice: We will compare the three different timepoints against each other, generating a total of 3 contrasts. 1. protruding mouth vs day 4 2. day 4 vs day 5 3. protruding mouth vs day 5
dds4v5$stageName <- droplevels(dds4v5$stageName)
dds4vM$stageName <- droplevels(dds4vM$stageName)
dds5vM$stageName <- droplevels(dds5vM$stageName)
dds4v5 = DESeq(dds4v5)
using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
dds4vM = DESeq(dds4vM)
using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
dds5vM = DESeq(dds5vM)
using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
res4v5 <- results(dds4v5, test = "Wald")
res4vM <- results(dds4vM, test = "Wald")
res5vM <- results(dds5vM, test = "Wald")
res5vM = lfcShrink(dds=dds5vM, coef = "stageName_Day.5_vs_Protruding.mouth",res = res5vM, type = "ashr")
using 'ashr' for LFC shrinkage. If used in published research, please cite:
Stephens, M. (2016) False discovery rates: a new deal. Biostatistics, 18:2.
https://doi.org/10.1093/biostatistics/kxw041
res4vM = lfcShrink(dds=dds4vM, coef = "stageName_Day.4_vs_Protruding.mouth",res = res4vM, type = "ashr")
using 'ashr' for LFC shrinkage. If used in published research, please cite:
Stephens, M. (2016) False discovery rates: a new deal. Biostatistics, 18:2.
https://doi.org/10.1093/biostatistics/kxw041
res4v5 = lfcShrink(dds4v5, contrast = c("stageName", "Day.4", "Day.5"),res = res4v5, type="ashr")
using 'ashr' for LFC shrinkage. If used in published research, please cite:
Stephens, M. (2016) False discovery rates: a new deal. Biostatistics, 18:2.
https://doi.org/10.1093/biostatistics/kxw041
Here we us LFCshrink via ashr to shrink our logfold2 changes from the DESeq outputs with respect to our contrasts. We do not use normal type as ashr and apeglm both outperform normal here.
Here we glance into our data to see if the model is well fit to the data:
plotDispEsts(dds4v5)
plotDispEsts(dds4vM)
plotDispEsts(dds5vM)
Everything looks mostly okay here; it mostly fits the “normal look” here.
(for (i in order(res5vM$padj, decreasing = F)[1:5]){
plotCounts(dds5vM, gene=i, intgroup="stageName", main = "")
title(rowData(dds5vM)[rownames(res5vM[i,]),]$SYMBOL)
})
NULL
fiveMplot = EnhancedVolcano(res5vM,
lab = rowData(dds5vM)$SYMBOL,
x = 'log2FoldChange',
y = 'pvalue')
Warning: One or more p-values is 0. Converting to 10^-1 * current lowest non-zero p-value...
fiveMplot
ggsave("outputs/fivevsMvolcano.pdf", height = 16, width = 8)
(for (i in order(res4vM$padj, decreasing = F)[1:5]){
plotCounts(dds4vM, gene=i, intgroup="stageName", main = "")
title(rowData(dds4vM)[rownames(res4vM[i,]),]$SYMBOL)
})
NULL
fourMplot = EnhancedVolcano(res4vM,
lab = rowData(dds4vM)$SYMBOL,
x = 'log2FoldChange',
y = 'pvalue')
Warning: One or more p-values is 0. Converting to 10^-1 * current lowest non-zero p-value...
fourMplot
ggsave("outputs/fourvsMvolcano.pdf", height = 16, width = 8)
(for (i in order(res4v5$padj, decreasing = F)[1:5]){
plotCounts(dds4v5, gene=i, intgroup="stageName", main = "")
title(rowData(dds4v5)[rownames(res4v5[i,]),]$SYMBOL)
})
NULL
fourfiveplot = EnhancedVolcano(res4v5,
lab = rowData(dds4v5)$SYMBOL,
x = 'log2FoldChange',
y = 'pvalue')
fourfiveplot
ggsave("outputs/fourvsfivevolcano.pdf", height = 16, width = 8)
fro mall of this we do see that HOx genes are not the most DEgenes. Instead we see the two DE genes that seem to be related to the trypsin? ctrl and prss1.
Many of the other top genes are interspersed through many other gene ontologies, including fatty acid metaboloism/transport, and location specific markers i.e. (cyp3a65 is expressed in the membrane of these structures: digestive system, eye, gill, heart and ovary.)
Since this was my original question, let’s filter and generate a heatmap of just the HOX genes, and let’s see if we get some variability here.
From this we do see that unfortunately… all of the samples seem to just be relatively prolific in HOX expression across the three timepoints chosen. This may be due to the fact that these are later timepoints and are bulk samples. We capture the entire signal and as such we do not have the distinction of where these Hox genes are being expressed across the embryo, and we only know that the genes are being highly expressed. Still we do see some differences in the hox expression. Particularly, there does seem to be some differences in expression between the Protruding mouth when compared to day 4 and 5.
paste4v5= lapply(1:length(sig_4v5), \(x){
if (sig_4v5[x] == TRUE){
paste("4v5: ", res4v5[x,]$padj)
} else{
NA
}
})
Error in if (sig_4v5[x] == TRUE) { :
missing value where TRUE/FALSE needed